Skip to content

Bootcamp/Euclid (Chijioke Nna): Week 6 - Price Is Right [CapStone]#2067

Draft
cjayprime wants to merge 2 commits intoed-donner:mainfrom
cjayprime:feature/week6
Draft

Bootcamp/Euclid (Chijioke Nna): Week 6 - Price Is Right [CapStone]#2067
cjayprime wants to merge 2 commits intoed-donner:mainfrom
cjayprime:feature/week6

Conversation

@cjayprime
Copy link

PR: Add Product Price Estimation Pipeline (Price is Right)

Overview

This PR introduces the Price is Right capstone project, a comprehensive machine learning pipeline designed to estimate product prices from text descriptions. The tool transitions from data exploration and baseline benchmarking to interactive performance visualization, utilizing the ed-donner/items_lite dataset to evaluate how LLM-based regression scales with training data.

Key Features

  • Automated Data Ingestion: Integrated fetching and processing of the Hugging Face items_lite dataset, converting raw signals into structured pandas DataFrames for analysis.
  • Baseline Benchmarking: Implements a statistical "Mean Baseline" model to establish a performance floor ($MAE$) against which all subsequent fine-tuned models are measured.
  • Interactive Analytics Dashboard: A custom-built Gradio interface that allows users to toggle between price distributions, market segment breakdowns, and model scaling laws.
  • Performance Scaling Analysis: Tracks and visualizes the "Learning Curve," demonstrating how Mean Absolute Error (MAE) improves as the number of training samples increases from 0 to 5,000.

What’s Inside

  • Backend: Data pipeline using datasets and OpenAI SDK integration for future fine-tuning extensibility.
  • Logic: Custom evaluation suite including a run_baseline_mean function and pd.cut categorization for market segmenting (Budget vs. Luxury).
  • Visuals: A three-tier visualization suite using Seaborn and Matplotlib to render histograms, pie charts, and regression performance curves.
  • Workflow: A streamlined Jupyter execution flow from .env configuration to a live-hosted Gradio dashboard.

Screens:

image image image

CC: @ranskills

@ranskills
Copy link
Contributor

Thanks, @cjayprime

Open the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants